Common Kubernetes Deployment Issues and How to Fix

This document refers to the most common Kubernetes deployment problems, their likely causes, and steps to fix them.

1. ImagePullBackOff / ErrImagePull

Problem:

The pod cannot pull the container image.

Possible Causes:

Incorrect image name or tag
Image is hosted in a private registry without authentication
Rate limiting on public registries (e.g., DockerHub)

How to Fix:

Check the image name and tag using kubectl describe pod <pod-name>
If using a private registry, create and apply an imagePullSecret:

kubectl create secret docker-registry myregistrykey \
  --docker-username=<user> \
  --docker-password=<password> \
  --docker-server=<registry>

Add to your pod or deployment:

imagePullSecrets:
- name: myregistrykey

2. CrashLoopBackOff

Problem:

The container is repeatedly crashing and restarting.

Possible Causes:

The application inside the container is exiting unexpectedly
Invalid configuration or environment variables
Failing liveness or readiness probes

How to Fix:

Retrieve logs with:
```
k  ubectl logs <pod-name> --previous 
```
Verify entrypoint, startup scripts, and environment variables
Use initContainers if dependencies need to be initialized first

3. Pending Pods

Problem:

Pods remain in a Pending state and are not scheduled.

Possible Causes:

Requested resources (CPU, memory, GPU) exceed what’s available
Taints on nodes prevent scheduling
Affinity or anti-affinity rules cannot be satisfied

How to Fix:

Describe the pod:
```
kubectl describe pod <pod-name> 
```
Adjust resource requests/limits or scale your cluster

Add tolerations to your pod spec if needed:

tolerations:
- key: "example"
operator: "Exists"
effect: "NoSchedule"

4. Service Not Reachable

Problem:

A pod or service cannot connect to another service using its DNS name.

Possible Causes:

Incorrect service selectors
No matching pods (endpoints list is empty)
DNS resolution issues

How to Fix:

Check the service:
```
kubectl describe svc <svc-name> 
```
Make sure the pods have labels that match the service selector
Verify the service has active endpoints:
```
kubectl get endpoints <svc-name> 
```

5. ConfigMap or Secret Not Found

Problem:

Pod startup fails due to missing ConfigMap or Secret.

Possible Causes:

The ConfigMap or Secret doesn't exist
Typo in the resource name
Resource exists in a different namespace

How to Fix:

Ensure the ConfigMap or Secret is created in the correct namespace
Validate the names and keys used in the pod spec

6. Liveness or Readiness Probe Failures

Problem:

Probes fail, causing the pod to restart or remain unready.

Possible Causes:

Application takes time to become ready
Incorrect path or port specified
Probes are too aggressive

How to Fix:

Add initialDelaySeconds and adjust probe intervals:
```
initialDelaySeconds: 10
periodSeconds: 5
```
Validate the health endpoint inside the container is working

7. Deployment Not Updating

Problem:

Changes to the deployment do not result in new pods being created.

Possible Causes:

No actual changes in the pod template
Deployment rollout is paused

How to Fix:

Ensure the pod spec changes (e.g., image tag, env var)
Force a rollout if needed:

kubectl rollout restart deployment <deployment-name> 

8. PVC Pending

Problem:

PersistentVolumeClaim (PVC) remains in Pending state.

Possible Causes:

No available PersistentVolume (PV)
StorageClass does not exist or does not match
Mismatch in requested size or access mode

How to Fix:

Check available PVs and StorageClasses:

kubectl rollout restart deployment <deployment-name> 

Ensure a PV with matching size, access mode, and storage class exists

9. Node Pressure (Disk/CPU/Memory)

Problem:

Pods are evicted or not scheduled due to node resource constraints.

Possible Causes:

Node is under resource pressure
Kubelet evicts pods when thresholds are breached

How to Fix:

Inspect node status:
```
kubectl describe node <node-name> 
```
Free up resources or reschedule workloads
Tune kubelet eviction settings if necessary

10. RBAC Permission Denied

Problem:

Service account or user is denied permission for an action.

Possible Causes:

Missing Role or ClusterRole
No RoleBinding or ClusterRoleBinding assigned

How to Fix:

Test access using:

kubectl auth can-i get pods --as=system:serviceaccount:<namespace>:<service-account> 

Apply the appropriate Role or ClusterRole and bind it to the service account

1. ImagePullBackOff / ErrImagePull​

Problem:​

Possible Causes:​

How to Fix:​

2. CrashLoopBackOff​

Problem:​

Possible Causes:​

How to Fix:​

3. Pending Pods​

Problem:​

Possible Causes:​

How to Fix:​

4. Service Not Reachable​

Problem:​

Possible Causes:​

How to Fix:​

5. ConfigMap or Secret Not Found​

Problem:​

Possible Causes:​

How to Fix:​

6. Liveness or Readiness Probe Failures​

Problem:​

Possible Causes:​

How to Fix:​

7. Deployment Not Updating​

Problem:​

Possible Causes:​

How to Fix:​

8. PVC Pending​

Problem:​

Possible Causes:​

How to Fix:​

9. Node Pressure (Disk/CPU/Memory)​

Problem:​

Possible Causes:​

How to Fix:​

10. RBAC Permission Denied​

Problem:​

Possible Causes:​

How to Fix:​

1. ImagePullBackOff / ErrImagePull

Problem:

Possible Causes:

How to Fix:

2. CrashLoopBackOff

Problem:

Possible Causes:

How to Fix:

3. Pending Pods

Problem:

Possible Causes:

How to Fix:

4. Service Not Reachable

Problem:

Possible Causes:

How to Fix:

5. ConfigMap or Secret Not Found

Problem:

Possible Causes:

How to Fix:

6. Liveness or Readiness Probe Failures

Problem:

Possible Causes:

How to Fix:

7. Deployment Not Updating

Problem:

Possible Causes:

How to Fix:

8. PVC Pending

Problem:

Possible Causes:

How to Fix:

9. Node Pressure (Disk/CPU/Memory)

Problem:

Possible Causes:

How to Fix:

10. RBAC Permission Denied

Problem:

Possible Causes:

How to Fix: